Environmental Sound Source Identifica Model for Robust Speech

نویسندگان

  • Takanobu Nishiura
  • Satoshi Nakamura
  • Kiyohiro Shikano
چکیده

In real acoustic environments, humans communicate with each other through speech by focusing on the target speech among environmental sounds. We can easily identify the target sound from other environmental sounds. For hands-free speech recognition, the identification of the target speech from environmental sounds is imperative. This mechanism may also be important for a selfmoving robot to sense the acoustic environments and communicate with humans. Therefore, this paper first proposes Hidden Markov Model (HMM)-based environmental sound source identification. Environmental sounds are modeled by three states of HMMs and evaluated using 92 kinds of environmental sounds. The identification accuracy was 95.4%. This paper also proposes a new HMM composition method that composes speech HMMs and an HMM of categorized environmental sounds for robust environmental sound-added speech recognition. As a result of the evaluation experiments, we confirmed that the proposed HMM composition outperforms the conventional HMM composition with speech HMMs and a noise (environmental sound) HMM trained using noise periods prior to the target speech in a captured signal.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical sound source identification in a real acoustic environment for robust speech recognition using a microphone array

It is very important for a hands-free speech interface to capture distant talking speech with high quality. A microphone array is an ideal candidate for this purpose. However, this approach requires localizing the target talker. Conventional talker localization methods in multiple sound source environments not only have difficulty localizing the multiple sound sources accurately, but also have ...

متن کامل

Robust Structured Voice Extraction for Flexible Expressive Resynthesis a Dissertation Submitted to the Department of Electrical Engineering and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

Parametric representation of audio allows for a reduction in the amount of data needed to represent the sound. If chosen carefully, these parameters can capture the expressiveness of the sound, while reflecting the production mechanism of the sound source, and thus allow for an intuitive control in order to modify the original sound in a desirable way. In order to achieve the desired parametric...

متن کامل

Sound Source Localization Using a Profile Fitting Method with Sound Reflectors

In a two-microphone approach, interchannel differences in time (ICTD) and interchannel differences in sound level (ICLD) have generally been used for sound source localization. But those cues are not effective for vertical localization in the median plane (direct front). For that purpose, spectral cues based on features of head-related transfer functions (HRTF) have been investigated, but they ...

متن کامل

Exploiting correlogram structure for robust speech recognition with multiple speech sources

This paper addresses the problem of separating and recognising speech in a monaural acoustic mixture with the presence of competing speech sources. The proposed system treats sound source separation and speech recognition as tightly coupled processes. In the first stage sound source separation is performed in the correlogram domain. For periodic sounds, the correlogram exhibits symmetric tree-l...

متن کامل

A Neural Oscillator Sound Separator for Missing Data Speech Recognition

In order to recognise speech in a background of other sounds, human listeners must solve two perceptual problems. First, the mixture of sounds reaching the ears must be parsed to recover a description of each acoustic source, a process termed ‘auditory scene analysis’. Second, recognition of speech must be robust even when the acoustic evidence is missing due to masking by other sounds. This pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003